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Abstract 



A Bayesian method of moments/instrumental variable (BMOM/IV) 
approach is developed and applied in the analysis of the important mean and 
multiple regression models. Given a single set of data, it is shown how to 
obtain posterior and predictive moments without the use of likelihood 
functions, prior densities and Bayes' Theorem. The posterior and predictive 
moments, based on a few relatively weak assumptions, are then used to 
obtain maximum entropy densities for parameters, realized error terms and 
future values of variables. Posterior means for parameters and realized error 
terms are shown to be equal to certain well known estimates and rationalized 
in terms of quadratic loss functions. Conditional maxent posterior densities 
for means and regression coefficients given scale parameters are in the 
normal form while scale parameters' maxent densities are in the exponential 
form. Marginal densities for individual regression coefficients, realized error 
terms and future values are in the Laplace or double-exponential form with 
heavier tails than normal densities with the same means and variances. It is 
concluded that these results will be very useful, particularly when there is 
difficulty in formulating appropriate likelihood functions and prior densities 
needed in traditional maximum likelihood and Bayesian approaches. 



1 INTRODUCTION 

In the traditional likelihood and Bayesian approaches, it is usually assumed that enough 
information is available to formulate a likelihood function and, in the Bayesian approach, a 
prior density for the parameters of the selected likelihood function — see e.g. Jeffreys (1988), 
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Box and Tiao (1973), Berger (1985), Geisser (1993), Press (1989), and Zellner (1971). 
However, if not enough information is available to specify a form for the likelihood function, 
then clearly there will be problems in both the traditional likelihood and Bayesian 
approaches. In situations like this, some resort to non-likelihood based methods, say least 
squares regression, method of moments or boot-strap approaches that are "data-based." That 
is, least squares is usually justified by its producing the best fit to a given sample of data 
with no appeal to sampling properties. However, if appropriate sampling assumptions are 
made, unbiasedness and a variance-covariance matrix of the least squares estimator can be 
produced without specifying a complete likelihood function. Similarly, the well-known 
Gauss-Markov theorem's assumptions provide certain optimality properties for the least 
square estimator without introducing a likelihood function. These general results, obtained 
under certain sampling assumptions have been widely utilized. However, they do not yield 
conditional probability statements about possible values of parameters based on a single set 
of given observations. In the present paper, it will be shown how such conditional results 
can be obtained without introducing sampling assumptions, likelihood functions and prior 
densities. Moments of parameters and future values of variables will be derived and shown 
to have certain optimal properties based on just a single set of observed data. 

Further, to obtain a posterior distribution for parameters given just their moments, a 
maximum entropy (maxent) approach, see e.g. Jaynes (1982a,b) and Cover and Thomas 
(1991) will be utilized since this provides a most conservative choice of density that 
incorporates information in moment side conditions. However, if enough information is 
available to formulate a tentative likelihood function and a prior density for its parameters, 
the results of a traditional Bayesian analysis can be compared and/or combined with those 
yielded by the BMOM approach. See Green and Strawderman (1994) for an application of 
the BMOM approach in the analysis of a natural resource model with an unknown likelihood 
function. 

The plan of the paper is as follows. In Section 2, an analysis of a simple scalar mean 
process is presented since this is a central, important case and analysis of it reveals well the 
essential features of the BMOM approach. Then in Section 3, results for multiple regression 
and autoregressive models, are given. Section 4 includes a summary of results and 
indications of future research. 

2 ANALYSIS OF A SCALAR MEAN PROBLEM 

In this section, we assume that n given observations y_' = (yj, y 2 , .., y n ) have been 
obtained that relate to a scalar mean, 0, as follows: 

yi = e + Ui i=l, 2, n (2.1) 

where the Uj's are unobserved, realized error terms; see Chaloner and Brant (1988), Zellner 
(1975) and Zellner and Moulton (1985) for traditional Bayesian analyses of realized error 
terms. Note that we have made the important assumption that the errors, say measurement 
errors, are additive. Since and the Uj's are unobserved quantities with unknown values, 
we shall assume that we can view their possible values probabilistically. That is, given the 
data y, we assume that the means of and the Uj's as well as other features of their 
distributions exist but have unknown values. Note that a realized error term, u^, usually does 
not have a zero mean. Operations and assumptions introduced below will enable us to 
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express these posterior moments and other quantities as functions of the given data. 

2.1 First Order Posterior and Predictive Moments 

Given the observations' values in (2.1) both and the realized error terms, the Uj's have 
fixed unknown values that we shall regard as subjectively random just as was done in 
Chaloner and Brant (1988), Zellner (1975), and Zellner and Moulton (1985). Here, in 
contrast to the work in these papers, we shall assume that not enough information is 
available to formulate a likelihood function and a prior density and thus it is not possible to 
use Bayes' theorem. From (2.1) we have 

y = + u (2.2) 

where y = E^y^/n, a given sample mean and u = E^Uj/n, the mean of the realized error 
terms. The symbol E denotes a posterior expectation operator, so-called because it is utilized 
after the data have been observed. From (2.2), we have 

y = E0ID + EulD (2.3) 

where D denotes the given data, here (yj, y 2 , y n ). 
We now introduce the following assumption. 

Assumption I: EulD = E^EujID/n = (2.4) 

This assumption indicates that we believe that there is nothing systematic in the realized 
error terms and thus the expectation of their mean is equal to zero. If, for example, we 
believe that the i'th observation is an additive outlier, we would have u^ = T| + £j with the 
mean of the parameter r|, Er|ID ^ and then EulD ^ 0. 
Given Assumption I in (2.4), we have from (2.3), 

E0ID = y (2.5) 

that is the posterior mean of is the sample mean y. Note that this result has been obtained 
without selecting a likelihood function and a prior density for its parameters as is done in 
many analyses. 

Given the result in (2.5), we have on taking the expectation of both sides of (2.1), the 
posterior expectation of Uj, the i'th realized error term is: 

EuJD = yj - E0ID = y 1 -y (2.6) 

which is the deviation of Vj from the mean y. On summing both sides of (2.6) and dividing 
by n, we have E i=1 EuJD/n = E i=1 (y-y)/n = 0, the "sample analogue" of Assumption I. 

Further, note that if we seek an optimal point estimate, 0, relative to a quadratic loss 
function L(0,0) = (0-0) 2 , we have EL(0,0) = E(0-E0) 2 + (0-E0) 2 and as is well known, 
taking = E0 is optimal. Thus the estimate, y in (2.5) is optimal relative to squared error 
loss. 

If y n+ i is a future, as yet unobserved value satisfying y n+1 = + u n+1 , where u n+1 is 
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a future, as yet unrealized error term, we can write Ey n+1 ID = E0ID + Eu n+1 ID and assume 
that Eu n+1 ID = 0. Then 

Ey n+1 ID = E0ID = y (2.7) 

which is the mean of the future observation y n+1 given the data y_ and the information in 
Assumption I. y is an optimal point prediction for y n+1 relative to a squared error predictive 
loss function. 

The first order moments in (2.5), (2.6) and (2.7) are identical to those obtained in a 
traditional normal likelihood function, diffuse prior analysis. However (2.5), (2.6) and (2.7) 
have been obtained without an iid normality assumption and without an improper diffuse 
prior. 

2.2 Second Order Posterior and Predictive Moments 



Having derived the posterior and predictive means in (2.5) and (2.7), we now turn to 
consider derivation of second order posterior and predictive moments. 

From (2.1) in vector notation y_ = 10 + u where l' = (1, 1, 1) a 1 x n vector of ones, 
we have u = y_ - ty = [I - i(i'i)~V]y_ = [I - i(i'i)~V]u and thus 

u - u = i(i'i)"Vu = i(i'i)"V(u-u) (2.8) 



where t'u = has been employed in (2.8). Then, 

V(ulD) = E(u-u)(u-u)'ID = iCi'i)" VeCu-^Cu-^'IDiCi'i)" 1 !' (2.9) 

is a functional equation that V(ulD) must satisfy, where D denotes given sample and prior 
information. Thus we introduce the following assumption. 

Assumption II: V(ulo 2 ,D) = E(u-u)(u-u)'lo 2 ,D = l(l'l)~V<5 2 (2-10) 

with g a positive scalar parameter which satisfies (2.9). Note from = + Uj and yj = y 
+ Uj that Uj - Uj = y-0. Also, Uj-Uj = y-0 and thus u-Uj = Uj-Uj for all i,j. Thus it is not 
surprising that in (2.10) all variances are the same and all correlations equal 1. Then from 
y - = (i'i)~ 1 i'(y_-i0) = (i'i)~V(u-u), we have 

V(0Ig 2 ,D) = E(0-y) 2 lo 2 ,D = (l';0~ VE(u-^(u-^(l';0- W,D (2.11) 

= (I'l)" 1 ^ 2 = o 2 /n 

where Assumption II has been used for E(u-u)(u-u)'lo ,D in the first line of (2.11). Thus 
G /n is the posterior variance for given the parameter a and the data. 

In addition, E(u-u)'(u-u)ID a 2 = tr i(i'i)~Vo 2 = a 2 from (2.10). Then E(u-u)'(u_- 
u)ID/n = Eu'u/nID - u'u/n = Ea ID/n where EulD = u has been used. If we define Ea 2 ID 



= EE" =1 (u i -Eu) z /nlD =Eu'u/nlD, since EulD = by Assumption I, we have Eo z ID - u'u/n 
= Ea 2 /nlD or 

Eo 2 ID = u'u/(ri-l) = s 2 (2.12) 
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1 2 
which is the posterior mean of and thus V(0ID) = s^/n. 

For the future observation y n+1 = + u n+1 , Ey n+1 = y, given E0 = y and Eu n+1 ID,a 2 

= 0. Also, y n+1 - y = - y + u n+1 and 

E[(y n+1 -y) 2 lo 2 ,D] = E(0-y) 2 la 2 ,D + Eu 2 +1 lo 2 ,D = o 2 (l+l/n) (2.13) 

_ 2 2 2 

given that u n+ j and 0-y are uncorrelated, (2.11) and Eu n+ jlD,a = a . 

A traditional diffuse prior, normal likelihood approach produces results identical to 

those in (2.11) and (2.13). However, the traditional approach yields E(a 2 ID = vs 2 /(v-2) for 

V > 2 where V = n-1 rather than (2.12), E(o 2 ID) = s 2 . Also V(0ID) = s 2 /n in the BMOM 

approach, whereas in the traditional Bayesian approach V(0ID) = vs /(v-2)n. Further, (2.1), 

(2.12) and (2.13) are obtained without an iid normality assumption and an improper prior 

density. 

2.3 Derivation of Posterior and Predictive Densities 

Above we have derived the first two posterior moments of given a , namely E0ID 
= y and Var(0lo 2 ,D) = o 2 /n. Also, Eg 2 ID = s 2 = u'u/(n-l), V(0ID) = s 2 /n, Ey n+1 ID = y 
and Var(y n+1 lo 2 ,D) = (l+l/n)o 2 . It is well known that maxent densities can be derived that 
incorporate the information in moment conditions; see e.g. Jaynes (1982a,b), Cover and 
Thomas (1991) and Zellner (1991, 1993). That is, we choose a density, say f(x), to 
maximize H(f) = - Jf(x)faf(x)dx subject to J*x*f(x)dx = Uj, i = 0, 1, 2, .... with Uq = 1, 
where we have utilized uniform measure in defining entropy, H(f). See Shore and Johnson 
(1980) for consistency, invariance and other desirable properties of this entropy-based 
procedure. Here we have the following results. 

A. The proper maxent posterior density for given a 2 is a normal density with mean 
y and variance a /n, i.e. g N (0la 2 ,D) ~ N(y,a 2 /n). 

B. The proper maxent predictive density for y n+1 given o 2 and D with mean y and 
var-iance G 2 (l+l/n) is a normal density hj^(y n+ ^lo 2 ,D) with these moments, N[y,o 2 (l+l/n)]. 

C. The proper maxent posterior density for o 2 with Eg 2 = s 2 is the exponential density 
g e (G 2 ID) = (l/s^)exp{-a 2 /s 2 }, < a 2 < °°, with s 2 given in (2.12). 

D. From A and C, the joint posterior density for and o 2 is 

f(0,o 2 ID) = g N (0lo 2 ,D)g e (o 2 ID) (2. 14) 

which is a maxent density given that 0/a and a are assumed independent. 1 On integrating 
over a 2 , < a 2 < °°, the marginal posterior density for is— see Appendix for the derivation, 



g(0ID) = \/n/2s 2 exp W2n 16-yl/s -°° < < °° (2.15) 



2 

This double-exponential or Laplace marginal posterior density for is symmetric about 
y, the mean, median and modal value, with variance equal to s 2 /n and has thick tails relative 



J A maxent density without this independence assumption has also been derived. 
2 

See Stigler (1986, p. Ill) for a fascinating discussion of Laplace's derivation of this distribution using the 
"principle of insufficient reason." 
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to normal or Student-t densities. _ By a change of variable, z = \/n~(0-y)/s, a standardized 
form of (2.15) is: g(zlD) = (1/^2 ) exp {-^2 Izl}, -oo < z < oo. See Appendix for further 
properties of this density. 

E. From B and C, the joint density for y n+1 and a is 

h N (y n+1 lo 2 ,D)g e (a 2 ID) ( 2 - 16 ) 

where h N ~ N[y, a (1+1/n)] and g e is the exponential density shown in C. On integrating 
(2.16) with respect to a 2 , < a < °°, the marginal predictive density of y n+1 is in the 
following double-exponential form, 

h e (y n+1 ID) = (l/s eV /2)exp{-y2ly n+1 -yl/s e } -oo < y n+1 < oo (2.17) 

2 2 — 

where s g = (l+l/n)s . The mean, median and modal value of (2.17) are all equal to y and 

its variance is s 2 . As with the density in (2.15), the tails of the predictive density can be 

rather thick. 

F. The proper maxent posterior density for Uj = y-0, the i'th realized error term with 
mean Uj and variance a 2 /n is given by 

g N (u i la,D) = \ln/2nc 2 expj (u^ Ui) 2 n/2a 2 } -oo < Ui < oo (2.18) 

G. The marginal posterior density for Uj, obtained by integration from the joint density, 
g N ( Ui la,D)g e (a 2 ID) is 

g(u A ID) = /n/2s 2 expjV^iTlUi-Ujl/s} -oo < U j < oo (2.19) 

a double-exponential density centered at u- v the i'th residual. 

H. If it is known that yj > for all i, as with "time to failure" or "duration" data, 
< < oo and the maxent proper density for subject just to E0 = y > is the following 
exponential density 

g x (0ID) = (l/y)exp{-0/y) < < oo (2.20) 

The posterior and predictive densities above can readily be implemented in practice. 
As indicated below, they can be compared and/or combined with posterior and predictive 
densities derived from assumed likelihood functions and prior densities using Bayes' 
theorem. 

3 BMOM ANALYSIS OF MULTIPLE REGRESSION AND AUTO- 
REGRESSION MODELS 

The BMOM analysis of multiple regression and autoregression models is quite similar 
to that utilized in analyzing the scalar mean model in the previous section. The given 
n x 1 observation vector y_ is assumed to satisfy the following n well known equations, 

y = Xj3 + u (3.1) 

where X is a given n x k matrix of rank k, J3 is a k x 1 vector of regression coefficients with 
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unknown values and u is an n x 1 vector of realized error terms. If (3.1) relates to an 
autoregression, it is assumed that the initial values of the output variable, say (y Q , y_j, 
y.(q-l))' f° r a <l'th order autoregression are known and are elements of the X matrix. Given 
that it is assumed that we do not have enough information to formulate a likelihood function 
with much confidence, the BMOM approach can be employed to derive posterior and 
predictive moments and densities. 

3.1 Derivation of First Order Posterior and Predictive Moments 

To derive first order moments, we multiply both sides of (3.1) by (X'X) -1 X' and take 
the posterior expectation of both sides to obtain, as in connection with (2.2), 

(X'X) _1 X'y_ = EJ3ID + (X'X^X'EulD (3.2) 

Now paralleling Assumption I above in (2.4), the following Assumption I' is 
introduced. 

Assumption V: (X'X) -1 X'EulD = (3.3) 

If one of the columns of X is a column of ones, that is there is an intercept in (3.10), (3.3) 
includes Assumption I, namely EulD = X^EUjlD/n =0. Further, (3.3) implies that there is 
nothing "systematic" in EulD that is correlated with the columns of X. For example, this 
would not be the case if it were believed that some important variable or variables had been 
omitted from the relation in (3.1) or if it were believed that the values of the independent 
variables in X were measured with error. In these two cases, the u^'s, would contain 
components that would be expected to be correlated with columns of X and (3.3) would not 
be expected to hold. However, if it is believed that (3.3) is valid, then clearly from (3.2) the 
posterior mean, Ej3 is given by 

EJ3ID = (X'X^X'y = £ (3.4) 

which is the least squares quantity, a rather simple result . Further, given a quadratic loss 
function, L(|3,j3) = (J3-fi)'Q(J3-j5), where Q is a given positive definite symmetric matrix, the 
value of j3 that minimizes posterior expected loss is the posterior mean in general and given 
in (3.4) for this specific problem. 

From (3.4), the mean of the realized error vector is given by 

EulD = y_ - XEJ3ID = y_ - X fj = u (3.5) 

where u is the least squares residual vector and we have X'u = 0, the sample analogue of 
Assumption I' in (3.3). 

As regards the predictive mean, we have for a future observation, y f , assumed to satisfy 
Yf = 2Lf& + u p with the 1 x k vector Xf given and u f a random, as yet unrealized error term 
assumed drawn from a distribution with a zero mean (equal to EulD = 0, as assumed above 
in Assumption I'). Then the predictive mean is 



If, for example, rather than (3.1), we have y_ = X]3 + + e, that is u = 
(X'X) _1 X'Zi + (X'X^X'e and £ = EJ3ID + (X'X^X'ZEvJD given (X'X^X'EelD = 0. Thus EfilD * £ 
for X'Z * and E^ID * 0. 
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Ey f ID = XfEj3lD = Xfi = y f (3.6) 

which is just the least squares point prediction. Given a squared error predictive loss 
function, the predictive mean in (3.6) minimizes the expectation of such a predictive loss 
function. 

Having obtained the first order moments above, we now turn to derive second order 
moments. 

3.2 Derivation of Second Order Posterior and Predictive Moments 

From (3.1) and (3.4), we have J3 - EJ3ID = (X'X) _1 X'(u-u), where u = y-Xj3 with X'u 
= 0. Thus the posterior covariance matrix, denoted by V(|3ID), is 

V(J3ID) = E(]3-£)(]3-£)'|D = (X'X)- 1 X'E(u-u)(u-u)'IDX(X'X)- 1 (3.7) 

To evaluate (3.7) another assumption is needed, paralleling that in (2.10). Given that u = 
(I - X(X'X) _1 X')x = (I - X(X'X) _1 X')u, we have that 

u - u = X(X'X) _1 X'(u-u) (3.8) 

and 

E(u-u)(u-u)'ID = X(X'X)- 1 X'E(u-u)(u-u)'IDX(X'X)- 1 X' (3.9) 

is a functional equation that must be satisfied. Note that only k elements of u are free. 
Thus, we introduce the following assumption, the analogue of (2.10): 

Assumption II': V(uIg 2 ,D) = E(u-u)(u-u)'lo 2 ,D = X(X , X) _1 X'a 2 (3.10) 

1 4 
where a is a positive constant. On inserting the expression in (3.10) in (3.9), it is seen 

that the functional equation is satisfied for any given value of a 2 . Substituting from (3.10) 

in (3.7), we have 

V(J3la 2 ,D) = (X'Xy l o 2 (3.11) 

which is the posterior covariance matrix for J3 given a and D, the data. 

To obtain a value for the posterior expectation of a 2 , from (3.10), E(u-u)'(u-u)lo 2 ,D 
= tr X(X'X) _1 X'a 2 = ko 2 . Then Eu'u/nID - u'u/n = kEa 2 ID/n and if we define Eg 2 ID 
= EE i=1 (u f Eu) 2 /nlD = Eu'u/nID, since EulD = from Assumption I', Eo 2 ID - u'u/n = 
kEa 2 ID/n or 

Eo 2 ID = u'u/(n-k) = s 2 (3.12) 

which is the posterior mean of o . 

For the future observation, y f = Xf'J3 + u f , Ey f ID = x f 'J| = y f , as shown above. Then 
Yf - ff = 2Lf(J3-£) + u f and 



If we write y_ = XHH fi + u = Z9 + u, where H is a square kxk non-singular matrix such that H'X'XH = 
Ij, and 9 = H _1 Fj, a kxl vector. Then Z'y_ = 8 + Z'u and E8ID = Z'y_ given Z'EulD = 0. Also, if we assume 
that the k error terms in Z'u have equal variances and are mutually uncorrelated, this implies (3.10) and (3.11). 
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E(y f y f ) 2 la 2 ,D = (1 + x^X'xy^a 2 (3.13) 

given that and the elements of J3-j^ have zero covariances and Eu^lD = 0. 

Given the above posterior and predictive moments, the following are maxent posterior 
and predictive densities that incorporate the information in these moments. 

A'. The proper maxent posterior density for J3 given a 2 and the data with mean S = 
(X'X) _1 X'y_ and covariance matrix (X'X) _1 G 2 , is a multivariate normal density, g N (|3lo ,D) 
~ MVN(£, (X'X) _1 a 2 ). 

B'. The proper maxent predictive density for y f given a 2 , Xf and the data is a normal 
density, h N (y f lo 2 ,D) ~ N[(y f , (1 + Xf(X'X)" 1 x f )o 2 )]. 

C. The proper maxent density for a with Ea = s = u'u/(n-k) is the exponential 
density g e (o 2 ID) = (l/s 2 )exp(-o 2 /s 2 ), < a 2 < °°. 

D'. From A' and C, the joint posterior density for J3 and o is 5 

f(£,o 2 ID) = g N (fjlo 2 ,D)g e (o 2 ID) (3.14) 

which is a maxent density given that J3/g and o are independent. The marginal posterior 
density of a single element of J3, say (3j, can be obtained by integrating (3.14) with respect 
to the remaining elements of J3 and then with respect to a 2 . The result is the following 
double exponential density for (3^: 

g(PilD) = (l/ Si y2)exp{V2 iPi 3il/ Si } - < p. < oo (3.15) 

where (5j is the i'th element of and s 2 is the (i,i)'th element of (X'X)'^s 2 . Also, from 
(3.14), the marginal distribution of T| = |'J3, where C is a given vector of rank one, is 

g(rilD) = (l/s^j exp{ -y/2 \r\ f|l/ Sr| } -oo < n < oo (3.16) 

where f\ = <?'£ and s 2 = {'(X'X)" 1 ! s 2 

E'. From B' and C, the joint density for y f and a is 

h N (y f lo 2 ,D)g e (o 2 ID) (3.17) 

where ~ N[y^, (1 + x^X'X)" x^)a ] and g e is the exponential density shown in C. On 
integrating (3.17) with respect to a , the marginal predictive density for y f is the following 
double-exponential density: 

h f (y f ID) = (l/s eV /2 ) exp{V2 ly r y f l/s e } -oo < y f < oo (3.18) 

2 '12 

The mean of this density is y f and its variance is s g = [1 + x f (X'X) x_f]s . 

F'. The proper maxent posterior density for Uj with given mean Uj and given variance 
Vj = ^(X'Xyh^ 2 is 



To compute the joint density of the elements of ]3, draw a from g e (o ID) and insert the drawn value in 
g N ([3lC7 2 ,D) and draw fj from this multivariate normal density. Thus draws from the joint density f(Ti,o 2 ID) are 
obtained by repeated use of this procedure. 
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g N ( Ui la 2 ,D) = l/^TWi expl-Cui-Ui) 2 ^} 



(3.19) 



G'. The marginal posterior density for Uj, obtained by integrating the joint density 



g N (u i la,D)g e (o ID) with respect to a is: 



g e (UilD) 



1A 2V: 



exp -v 2 luj Ujl/J Vj 



-oo < Uj < oo 



(3.20) 



a double-exponential density. 

H'. If it is known that = fi'J3 is strictly positive, where C' is a given 1 x k vector 

of rank one, the maxent density for subject just to E0ID = C'J| > 0, is the following 
exponential density: 



g(OID) = (W£)exp{-0/f£} 

3.3 Use of Additional Prior Information 



o < e < oo 



(3.21) 



If in addition to the sample information y_ = XJ3 + u, we represent additional prior 
information by use of a conceptual sample, y^ = XJ3 + u^, where y^ is an n c xl vector of 
conceptual data points, X c is a n c xk given matrix, J3 is a kxl vector of regression parameters 
and u^ is an n c xl vector of realized conceptual error terms. We can write 



X) 



V X 7 



— c 
V» J 



or 



w = W J3 + 8 



(3.22a) 



(3.22b) 



Then making Assumption I' and II' relative to the system in (3.22b), the posterior mean 
of J3 is 



J3 = EJ3ID' = (W'W^W 



w 



(3.23) 



= (X C X C + X'X)- j (X c y c + X'y) 

= (XX + X'XyHx'x C L + X'X|i) 



where, when X c is of full column rank, J^, = (X c X c )" 1 X c y c is a prior mean vector, = 
(X'X) _1 X'x, the least squares estimate and X C X C is assigned a value by the investigator. 
Then, with Assumption II' relating to (3.22b), we have 



and 



E(J3-Ef5)(B_-Ej3)'ID,G 2 ) = (W'W) _1 a 2 



Eo 2 ID = (w-W^)'(w-W^)/(n+n c -k). 



(3.24) 
(3.25) 



Also maxent distributions are available for this system that incorporates a conceptual sample. 

Above are some results of applying the BMOM approach to analyze data assumed 
generated by a multiple regression or an autoregression model. In addition, posterior and 
predictive intervals can easily be computed from the above posterior and predictive densities 
by the procedures described in the Appendix. Generally these intervals are broader than 
corresponding intervals based on the conditional normal posterior and predictive densities, 



-10- 



derived above, with o = s . Also they are broader than conventional intervals based on 
marginal Student-t densities based on posterior and predictive densities based on normal 
likelihood functions and diffuse prior densities for their parameters. See Appendix A for one 
such comparison. 

4 Summary and Concluding Remarks 

In the preceding sections, posterior and predictive moments for parameters and future, 
as yet unobserved observations have been derived using one given set of data and a few 
simple assumptions. There was no need to formulate a density function for as yet 
unobserved data as a basis for a likelihood function nor to introduce a prior density for its 
parameters. Also, no use was made of Bayes' Theorem in deriving posterior and predictive 
moments. 

Then, proper posterior and predictive densities were derived by maximizing entropy 
subject to the derived moments. These densities for location or regression coefficients with 
doubly infinite ranges are in the double exponential or Laplacian form while the maxent 
posterior density for variance parameters are in the exponential form. For location 
parameters with strictly positive values, the maxent posterior densities subject just to a first 
moment constraint are in the exponential form. Similar results were obtained for predictive 
densities for as yet unobserved observations. 

It should be appreciated that these maxent predictive densities, in the double exponential 
or exponential form can serve as models for as yet unobserved data and employed in 
calculation of posterior odds in model comparison and selection problems using new data. 
In particular such models can be compared to those derived using a particular likelihood 
function, a prior density for its parameters and Bayes' Theorem. The two different posterior 
densities, based on analyses of a given sample of data can be employed as prior densities 
in the calculation of posterior odds using a second sample of data. The posterior odds can 
then be used to choose between the two models or to combine them using the approach 
described in Min and Zellner (1993) and Palm and Zellner (1992). In future work, such 
calculations will be reported. See Zellner (1995) for some BMOM results relating to 
multivariate regression and simultaneous equations models. 
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Appendix 

Derivation of Double Exponential Density in (2.15) 

From_(2.14), f(0,a 2 ID) = g N (0la 2 ,D)g (a 2 ID) with g N (0lo 2 ,D) =\/n/27CO^ x 
exp{-n(0-y) 2 /2o 2 } and g e (o 2 ID) = (l/s 2 )exp{-(T/s 2 ) . Then, with a = 1/s 2 and b = n(0-y) 2 /2 



J^°°f(0,o 2 |D)da 2 = aJ^°°_Lexp{-[b/a 2 + ao 2 ]}da 2 

= \j2n/n aJ^°°exp{-[b/o~ 2 + ao 2 ]}da 



(A.1) 



i/2n/7l a [-^V?t/a exp|-2\/ab 
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where the integral has been evaluated using a result given in Gradshteyn and Ryzhik (1980, 
p. 307, entry 3.325). On inserting the above values for a and b in (A.l), we have 



g(0ID) = \/n/2s 2 exp{-2yn/2l0-yl/s| -oo < < oo (A.2) 

Further, letting w = \/n/2(0-y)/s, a "standardized" form of the density is 

g(wlD) = exp{-2lwl} -oo < w < oo (A.3) 

Note that J^°°exp{-2w}dw = [ 1 /2exp( 2w) Q = 1/2 and, given symmetry of g(wlD), 
°°g(wlD)dw = 1. Also, the symmetry of g(wlD) implies that all odd order moments 
about zero are zero; that is, w^ 1Ti g(wlD)dw =0 for r = 0, 1, 2, .... The even order 
moments are given by 

u 2r = 2j;~w 2r e- 2w dw 

= Jf°(x/2) 2r e- x dx = r(2r + l)/2 2r r = 1, 2, ... 

= (2r)!/2 2r 



where x = 2w and r( ■ ) is the gamma function. For example, \Xj = 1/2, u.4 = 3/2, etc. 

Note that w = \jn (0-y)/s\/2 = z/^2 , where z = \/n~(0-y)/s is a standardized normal 
variable for the normal conditional posterior density for given o 2 = s 2 , g N (0la 2 = s 2 ,y) 
~ N(y, s 2 /n). For the conditional density, E(z 2 la 2 = s 2 ,D) = 1, while for the unconditional 
density, E(z 2 ID) = 2E(w 2 ID) = 1 which, surprisingly, is equal to the conditional variance. 
However, E(z la = s ,D) = 3 in the conditional normal density for z while using the 
marginal density Ez 4 ID = 4Ew 4 ID = 4 • 3/2 = 6, two times the value of that for the 
conditional density. The fourth moment of z divided by the squared second moment, 
denoted by [i^/^ = 6 and the "excess" over that for the conditional normal density for z is 
6-3 = 3. Thus the marginal double-exponential density for z is quite leptokurtic. 

Note that z = yV (0-y)/s has a univariate Student-t posterior density with v = n-1 
degrees of freedom if a standard normal likelihood function for and o and a diffuse prior 
7l(0,G) c< 1/a were combined using Bayes' Theorem.^ For this Student-t posterior density, 
we have: EzlD = for v > 0, Ez 2 ID = v/(v-2) for v > 2, and Ez 4 ID = 3v 2 /(v-2)(v-4) for v 
> 4. Thus the excess for the Student-t based posterior density for z is: |i 4 /|a 2 - 3 = 6/(v-4), 
for v > 4, which for v > 6 is considerably less than 3, the excess of the double-exponential 
density for z. Thus, for n - 1 = v > 6 or n > 7, the double-exponential density is more 
leptokurtic. Also, as v grows in value, the excess for the Student-t posterior density goes to 
zero whereas that_for the double-exponential density has a constant value equal to 3. 

Since z = yn (0-y)/s = \/2w, the double-exponential posterior density for z is from 
(A.3). 

p(zlD) = (l/v^jexpj-v^" Izl} -00 < z < 00 (A.5) 



This well known result, probably first derived by Jeffreys, is presented in many works including Jeffreys 
(1988), Berger (1985), Box and Tiao (1973), Press (1989), and Zellner (1971). 
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Then for c > 0, P c = Pr(c < z < oo|D) = f°° f(zlD)dz = (l/2)exp{-72 c} and thus to2P c 
= -sflc. For example for P c = .025, to .05 = -2.9957 = -1.4142c and c = 2.118. Thus 
.025 = Pr{2.118 < z < oo} and from the symmetry of the density in (A. 5), we have 

Pr{-2.118 < z < 2.118ID} = .95 (A.6) 

This 95% interval for z = ^n (0-y)/s, -2.118 to 2.118 implies that a 95% interval for is y 
+ 2.118 s/\fn . This interval is somewhat broader than a 95% interval based on the 
conditional normal posterior density for with o^= s 2 , namely y + 1.96 s/^/n , the widths 
being 4.24 s/\Jn in the former case and 3.92 s/yj n in the latter, about an 8% difference. 



-14- 



